Skip to content

STJ avoid full document string allocation#5908

Open
eMelgooG wants to merge 3 commits into
Azure:mainfrom
eMelgooG:fix/stj-avoid-full-document-string-allocation
Open

STJ avoid full document string allocation#5908
eMelgooG wants to merge 3 commits into
Azure:mainfrom
eMelgooG:fix/stj-avoid-full-document-string-allocation

Conversation

@eMelgooG
Copy link
Copy Markdown

Refs: #4652 (comment)

Pull Request Template

Description

Both read paths in CosmosSystemTextJsonSerializer currently materialize the entire response document as a UTF-16 System.String before parsing. This PR feeds the UTF-8 bytes directly into Utf8JsonReader instead, eliminating the allocation and the redundant UTF-8 ↔ UTF-16 transcoding on every read.

Binary path: replace cosmosObject.ToString() (which produces a UTF-16 string) with cosmosObject.WriteTo(textJsonWriter) + JsonSerializer.Deserialize<T>(ReadOnlySpan<byte>, ...).

Text path (DeserializeStream<T>): replace StreamReader.ReadToEnd() + Deserialize<T>(string, ...) with JsonSerializer.Deserialize<T>(Stream, ...).

Both changes are semantically equivalent for Cosmos response bodies (UTF-8 JSON), no public API impact. Reduces CPU, allocations, and GC/LOH pressure on every read.

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)

Closing issues

n/a

Changelog

  • [X ] I have added a changelog entry under ### Unreleased in changelog.md
    for the user-facing impact of this change.

eMelgooG and others added 3 commits May 28, 2026 16:54
Both read paths in CosmosSystemTextJsonSerializer materialize the entire
document as a System.String before parsing:

1. Binary path: cosmosObject.ToString() builds a UTF-16 JSON string from
   the UTF-8 bytes produced by IJsonWriter.GetResult() (via Utf8StringHelpers),
   only for JsonSerializer.Deserialize<T>(string, ...) to re-parse that UTF-16.
   Fix: feed the UTF-8 ReadOnlyMemory<byte> directly into
   JsonSerializer.Deserialize<T>(ReadOnlySpan<byte>, ...).

2. DeserializeStream<T> helper: StreamReader.ReadToEnd() transcodes UTF-8
   to a full UTF-16 string before Deserialize<T>(string, ...).
   Fix: call JsonSerializer.Deserialize<T>(Stream, ...) which feeds
   Utf8JsonReader directly from the UTF-8 byte stream.

Both changes are semantically equivalent for Cosmos response bodies (UTF-8
JSON) and eliminate full-document string allocations that land on the Large
Object Heap for non-trivial documents.

Refs: Azure#4652 (comment)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@eMelgooG
Copy link
Copy Markdown
Author

eMelgooG commented May 28, 2026

@eMelgooG please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.

@microsoft-github-policy-service agree [company="{your company}"]

Options:

  • (default - no company specified) I have sole ownership of intellectual property rights to my Submissions and I am not making Submissions in the course of work for my employer.
@microsoft-github-policy-service agree
  • (when company given) I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer. By signing below, the defined term “You” includes me and my employer.
@microsoft-github-policy-service agree company="Microsoft"

Contributor License Agreement

@microsoft-github-policy-service agree company="Microsoft"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant